-
-
Notifications
You must be signed in to change notification settings - Fork 625
fix: Parsing/pasting on prosemirror-model: 1.25.1
#1661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@blocknote/ariakit
@blocknote/code-block
@blocknote/core
@blocknote/mantine
@blocknote/react
@blocknote/server-util
@blocknote/shadcn
@blocknote/xl-docx-exporter
@blocknote/xl-multi-column
@blocknote/xl-odt-exporter
@blocknote/xl-pdf-exporter
commit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels wrong, could you give a write up on why this change is needed.
It feels like it litters the code everywhere, I may just be misunderstanding something here
Yep you were right, I went a bit overboard to make sure parse rules weren't being triggered accidentally and pretty much all of the |
Co-authored-by: Matthew Lipski <[email protected]>
Issue
This PR addresses breaking changes to parsing introduced by
prosemirror-model: 1.25.1. Prior to this update, theDOMParserwould drop any nodes parsed that were not valid in the schema, based on the parent node. For example, take a look at the following HTML:When parsing this before the update, the
lielement would get parsed as abulletListItemnode. Within it, thepelement should also get parsed as aparagraphnode. However, because the schema dictates thatparagraphnodes can't be withinbulletListItemnodes, theparagraphis dropped altogether and ignored.After the update, the
paragraphnode will no longer get dropped, and theDOMParserwill instead attempt to insert it somewhere so that it is valid in the schema. Since it can't be a child of thebulletListItemnode, it instead gets wrapped in ablockContainerandblockGroup, which then gets inserted into thebulletListItem's parentblockContainer. In this scenario, we actually want to ignore theptag and just parse its content.Overall, the changes in
prosemirror-model: 1.25.1mean we have to be more diligent when writing parse rules. Paragraphs especially appear in many places, e.g. external HTML paragraphs, internal HTML paragraphs, internal HTML list items, and internal HTML table cells.Parse rule changes
Internal HTML
After updating
prosemirror-model, external HTML parse rules were being triggered when parsingblockContentnodes. This is becauseblockContentnodes can have all kinds of HTML inside, which we actually don't care about for parsing. To fix this, all defaultblockContentnodes have received the following change in their parse rules:Before:
After:
This change was made following a suggestion by Marijn here.
When parsing a
blockContentelement, this now tells theDOMParserto ignore all descendant elements except the one with thebn-inline-contentclass, and only parse its content.Additionally, any
bn-inline-contentelements with thedata-editableattribute have had this attribute removed.External HTML
Because of the new parsing behaviour, we've had to add additional logic to list items and table cells.
HTML
lielements may have multiple block or inline elements in them, which is incompatible with our schema as*ListItemblocks can only contain inline content. By default, the new parsing behaviour lifts all nodes that are incompatible with the schema up, so any e.g.pandh1elements within aliare parsed as separate blocks as children of the*ListItemblock. This has been modified to be more Notion-like, and you can find the logic for this explained ingetListItemContent.ts.The new default behaviour for table cells is the same as for list items, i.e. elements like
pandh1get lifted as children of the parenttableblock. Unlike list items though, moving content that isn't compatible with the schema to the children doesn't really make sense, so we would rather drop it altogether. This is basically how it already worked before theprosemirror-modelupdate. However, the content of each element is now appended on the same line, whereas before, content from block-level elements would be appended to a new line (we may want to look into this again in the future).Additionally, there's a minor fix for where media elements (
embed,img,audio, andvideo) insidefigureelements causing their respectiveblockContentnodes to be parsed twice.Closes #1643
Closes #1645